Goto

Collaborating Authors

 accuracy constraint



Brownian Noise Reduction: Maximizing Privacy Subject to Accuracy Constraints

Neural Information Processing Systems

There is a disconnect between how researchers and practitioners handle privacy-utility tradeoffs. Researchers primarily operate from a privacy first perspective, setting strict privacy requirements and minimizing risk subject to these constraints. Practitioners often desire an accuracy first perspective, possibly satisfied with the greatest privacy they can get subject to obtaining sufficiently small error. Ligett et al. have introduced a `noise reduction algorithm to address the latter perspective. The authors show that by adding correlated Laplace noise and progressively reducing it on demand, it is possible to produce a sequence of increasingly accurate estimates of a private parameter and only pay a privacy cost for the least noisy iterate released.



Brownian Noise Reduction: Maximizing Privacy Subject to Accuracy Constraints

Neural Information Processing Systems

There is a disconnect between how researchers and practitioners handle privacy-utility tradeoffs. Researchers primarily operate from a privacy first perspective, setting strict privacy requirements and minimizing risk subject to these constraints. Practitioners often desire an accuracy first perspective, possibly satisfied with the greatest privacy they can get subject to obtaining sufficiently small error. Ligett et al. have introduced a "noise reduction" algorithm to address the latter perspective. The authors show that by adding correlated Laplace noise and progressively reducing it on demand, it is possible to produce a sequence of increasingly accurate estimates of a private parameter and only pay a privacy cost for the least noisy iterate released.


Brownian Noise Reduction: Maximizing Privacy Subject to Accuracy Constraints

Neural Information Processing Systems

There is a disconnect between how researchers and practitioners handle privacy-utility tradeoffs. Researchers primarily operate from a privacy first perspective, setting strict privacy requirements and minimizing risk subject to these constraints. Practitioners often desire an accuracy first perspective, possibly satisfied with the greatest privacy they can get subject to obtaining sufficiently small error. Ligett et al. have introduced a "noise reduction" algorithm to address the latter perspective. The authors show that by adding correlated Laplace noise and progressively reducing it on demand, it is possible to produce a sequence of increasingly accurate estimates of a private parameter and only pay a privacy cost for the least noisy iterate released.


Reviews: Accuracy First: Selecting a Differential Privacy Level for Accuracy Constrained ERM

Neural Information Processing Systems

Summary: This paper considers a different perspective on differentially private algorithms where the goal is to give the strongest privacy guarantee possible, subject to the a constraint on the acceptable accuracy. This is in contrast to the more common setting where the we wish to achieve the highest possible accuracy for a given minimum level of privacy. The paper introduces the idea of ex-post differential privacy which provides similar guarantees to differential privacy, but for which the bound on privacy loss is allowed to depend on the output of the algorithm. They then present a general method for private accuracy constrained empirical risk minimization which combines ideas similar to the noise reduction techniques of Koufogiannis et al. and the Above Threshold algorithm. At a high level, their algorithm computes ERM models for increasing privacy parameters (using the noise reduction techniques, rather than independent instantiations of a private learning algorithm) and then uses the above threshold algorithm to find the strongest privacy parameter satisfying the accuracy constraint.


SMART: Automatically Scaling Down Language Models with Accuracy Guarantees for Reduced Processing Fees

Jo, Saehan, Trummer, Immanuel

arXiv.org Artificial Intelligence

The advancement of Large Language Models (LLMs) has significantly boosted performance in natural language processing (NLP) tasks. However, the deployment of high-performance LLMs incurs substantial costs, primarily due to the increased number of parameters aimed at enhancing model performance. This has made the use of state-of-the-art LLMs more expensive for end-users. AI service providers, such as OpenAI and Anthropic, often offer multiple versions of LLMs with varying prices and performance. However, end-users still face challenges in choosing the appropriate LLM for their tasks that balance result quality with cost. We introduce SMART, Scaling Models Adaptively for Reduced Token Fees, a novel LLM framework designed to minimize the inference costs of NLP tasks while ensuring sufficient result quality. It enables users to specify an accuracy constraint in terms of the equivalence of outputs to those of the most powerful LLM. SMART then generates results that deviate from the outputs of this LLM only with a probability below a user-defined threshold. SMART employs a profiling phase that evaluates the performance of multiple LLMs to identify those that meet the user-defined accuracy level. SMART optimizes the tradeoff between profiling overheads and the anticipated cost savings resulting from profiling. Moreover, our approach significantly reduces inference costs by strategically leveraging a mix of LLMs. Our experiments on three real-world datasets show that, based on OpenAI models, SMART achieves significant cost savings, up to 25.6x in comparison to GPT-4.


Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving

Dai, Yinwei, Pan, Rui, Iyer, Anand, Li, Kai, Netravali, Ravi

arXiv.org Artificial Intelligence

Ramp predictions with sufficiently high confidence existing platform knobs (e.g., batch sizes) fail to ease this (subject to a threshold) exit the model, foregoing fundamental tension, and instead only enable users to harshly downstream layers and bringing corresponding savings in trade off one property for the other. This paper explores an both compute and latency. The intuition is that models alternate strategy to taming throughput-latency tradeoffs by are often overparameterized (especially given recent model changing the granularity at which inference is performed. We growth [31, 32, 48]), and certain'easy' inputs may not require present Apparate, a system that automatically applies and complete model processing for accurate results. Importantly, manages early exits (EEs) in ML models, whereby certain unlike existing platform knobs (e.g., batch size) inputs can exit with results at intermediate layers. To cope that simply walk the steep latency-throughput tradeoff curve, with the time-varying overhead and accuracy challenges that EEs rethink the granularity of inference on a per-input basis. EEs bring, Apparate repurposes exits to provide continual This, in turn, provides a path towards lowering request latencies feedback that powers several novel runtime monitoring and without harming platform throughputs.


eFAT: Improving the Effectiveness of Fault-Aware Training for Mitigating Permanent Faults in DNN Hardware Accelerators

Hanif, Muhammad Abdullah, Shafique, Muhammad

arXiv.org Artificial Intelligence

Fault-Aware Training (FAT) has emerged as a highly effective technique for addressing permanent faults in DNN accelerators, as it offers fault mitigation without significant performance or accuracy loss, specifically at low and moderate fault rates. However, it leads to very high retraining overheads, especially when used for large DNNs designed for complex AI applications. Moreover, as each fabricated chip can have a distinct fault pattern, FAT is required to be performed for each faulty chip individually, considering its unique fault map, which further aggravates the problem. To reduce the overheads of FAT while maintaining its benefits, we propose (1) the concepts of resilience-driven retraining amount selection, and (2) resilience-driven grouping and fusion of multiple fault maps (belonging to different chips) to perform consolidated retraining for a group of faulty chips. To realize these concepts, in this work, we present a novel framework, eFAT, that computes the resilience of a given DNN to faults at different fault rates and with different levels of retraining, and it uses that knowledge to build a resilience map given a user-defined accuracy constraint. Then, it uses the resilience map to compute the amount of retraining required for each chip, considering its unique fault map. Afterward, it performs resilience and reward-driven grouping and fusion of fault maps to further reduce the number of retraining iterations required for tuning the given DNN for the given set of faulty chips. We demonstrate the effectiveness of our framework for a systolic array-based DNN accelerator experiencing permanent faults in the computational array. Our extensive results for numerous chips show that the proposed technique significantly reduces the retraining cost when used for tuning a DNN for multiple faulty chips.


Accuracy-Guaranteed Collaborative DNN Inference in Industrial IoT via Deep Reinforcement Learning

Wu, Wen, Yang, Peng, Zhang, Weiting, Zhou, Conghao, Xuemin, null, Shen, null

arXiv.org Artificial Intelligence

Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.